BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines

نویسندگان

  • Alberto G. Jácome
  • Florentino Fernández Riverola
  • Anália Lourenço
چکیده

BACKGROUND AND OBJECTIVES Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. METHODS The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. RESULTS The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations meaningful to that particular scope of research. Conversely, indirect concept associations, i.e. concepts related by other intermediary concepts, can be useful to integrate information from different studies and look into non-trivial relations. CONCLUSIONS The BIOMedical Search Engine Framework supports the development of domain-specific search engines. The key strengths of the framework are modularity and extensibilityin terms of software design, the use of open-source consolidated Web technologies, and the ability to integrate any number of biomedical text mining tools and information resources. Currently, the Smart Drug Search keeps over 1,186,000 documents, containing more than 11,854,000 annotations for 77,200 different concepts. The Smart Drug Search is publicly accessible at http://sing.ei.uvigo.es/sds/. The BIOMedical Search Engine Framework is freely available for non-commercial use at https://github.com/agjacome/biomsef.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Getting to Know Wolfram|Alpha Computational Knowledge Engine and Its Applications in Biomedical Sciences

  Wolfram|Alpha Computational Knowledge Engine software, despite all internet search engines, tries to provide the the best answer for a question or compute an equation in the most correct way based on the current knowledge. Therefore, given the unique characteristic of Wolfram|Alpha and its vast applications, the aim of the present article is to familiarize the biomedical scientists with...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

BioinQA: metadata-based multi-document QA system for addressing the issues in biomedical domain

Despite the availability of large amount of biomedical literature; extracting relevant information catering to the exact need of the user has been difficult in the absence of efficient domain specific information retrieval tools. Biomedical question answering (QA) systems require special techniques to address domain-specific issues, since a wide variety of user-groups having different informati...

متن کامل

BioinQA: Addressing bottlenecks of Biomedical Domain through Biomedical Question Answering System

Recent advances in the realm of biomedicine and genetics in the post genomics era have resulted in an explosion in the amount of biomedical literature available. Large textbanks comprising of thousands of full-text biology papers are rapidly becoming available. Due to gigantic volumes of information available and lack of efficient and domain specific information retrieval tools, it has become e...

متن کامل

Identifying Latent Semantics in High-Dimensional Web Data

Search engines have become an indispensable tool for obtaining relevant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to discover the latent semantics in high-dimensional web data. This paper purports to explain a no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer methods and programs in biomedicine

دوره 131  شماره 

صفحات  -

تاریخ انتشار 2016